Applying Data Mining Techniques in Text Analysis

نویسندگان

  • Helena Ahonen
  • Oskari Heinonen
  • Mika Klemettinen
  • A Inkeri Verkamo
چکیده

A number of recent data mining techniques have been targeted especially for the analysis of sequential data Traditional examples of sequential data involve telecom munication alarms Www log les user action registration for Hci studies or any other series of events consisting of an event type and a time of occurrence Text can also be seen as sequential data in many respects similar to the data collected by sensors or other observation systems Traditionally texts have been analysed using various information retrieval related methods such as full text ana lysis and natural language processing However only few examples of data mining in text particularly in full text are available In this paper we show that general data mining methods are applicable to text analysis tasks under certain conditions Moreover we present a general framework for text mining The framework follows the general Kdd process thus containing steps from preprocessing to the utilization of the results The data mining method that we apply is based on generalized episodes and episode rules We consider preprocessing of the text to be essential in text mining by shifting the focus in the preprocessing phase data mining can be used to obtain results for various purposes We give concrete examples of how to preprocess texts based on the intended use of the discovered results and how to balance preprocessing with post processing We also present example applications including search for key words key phrases and other co occurring words e g collocations and generalized concordances These applications are both common and relevant tasks in information retrieval and natural language processing We also present results from real life data experiments to show that our approach is applicable in practice

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ارائه مدلی برای استخراج اطلاعات از مستندات متنی، مبتنی بر متن‌کاوی در حوزه یادگیری الکترونیکی

As computer networks become the backbones of science and economy, enormous quantities documents become available. So, for extracting useful information from textual data, text mining techniques have been used. Text Mining has become an important research area that discoveries unknown information, facts or new hypotheses by automatically extracting information from different written documents. T...

متن کامل

A Mutually Beneficial Integration of Data Mining and Information Extraction

Text mining concerns applying data mining techniques to unstructured text. Information extraction (IE) is a form of shallow text understanding that locates specific pieces of data in natural language documents, transforming unstructured text into a structured database. This paper describes a system called DISCOTEX, that combines IE and data mining methodologies to perform text mining as well as...

متن کامل

Text Mining in Analyzing the Presentation of Educational Trainers

This work deals with Text analysis that involves information retrieval through lexical analysis to learn word occurrence and distributions, pattern recognition, information extraction, data mining techniques and followed by visualization, and predictive analytics. The primary goal is to turn text into data for analysis, through application of natural language processing (NLP) and analytical too...

متن کامل

Designing a System for Trend Analysis of Users in Website Surfing in Iran Using Data Mining and Text Mining Algorithms

Background and Aim: As of the entrance of web surfing to the lifestyle of a vast majority of people in the society and the need for a more accurate social and cultural policy making in the field, authors intended to analyze the behavior of the society users in viewing different websites so as to help politicians and practitioners. Methods: Design science research method is used in this research...

متن کامل

Prediction of user's trustworthiness in web-based social networks via text mining

In Social networks, users need a proper estimation of trust in others to be able to initialize reliable relationships. Some trust evaluation mechanisms have been offered, which use direct ratings to calculate or propagate trust values. However, in some web-based social networks where users only have binary relationships, there is no direct rating available. Therefore, a new method is required t...

متن کامل

ارائه رویکردی برای مدیریت و سازمان‌دهی اسناد متنی با استفاده از تجزیه‌وتحلیل هوشمند متن

Regarding the fact that stored data occupies a large space in organizations and retention systems and information management that has been resulted in gigantic data warehouses, the need for extracting an appropriate model is felt increasingly. Text mining is one of the most significant methods for extracting a useful and appropriate model that helps organizations in achieving their goals throug...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997